Optimizing a Conjugate Gradient Solver with Non-Blocking Collective Operations

نویسندگان

  • Torsten Hoefler
  • Peter Gottschling
  • Wolfgang Rehm
  • Andrew Lumsdaine
چکیده

This paper presents a case study about the applicability and usage of non blocking collective operations. These operations provide the ability to overlap communication with computation and to avoid unnecessary synchronization. We introduce our NBC library, a portable low-overhead implementation of non blocking collectives on top of MPI-1. We demonstrate the easy usage of the NBC library with the optimization of a conjugate gradient solver with only minor changes to the traditional parallel implementation of the program. The optimized solver runs up to 34% faster and is able to overlap most of the communication. We show that there is, due to the overlap, no performance difference between Gigabit Ethernet and InfiniBand for our calculation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel scaling of Teter’s minimization for Ab Initio calculations

We propose a parallelization scheme for the conjugate gradient method by Teter et. al. and report a detailed analysis of its scalability. We use MPI collective operations exclusively to take advantage of optimized collective implementations with possible hardware support. Our parallel conjugate gradient calculation can be applied in addition to the already implemented parallelism in the applica...

متن کامل

MPI collectives at scale

Collective operations improve the performance and reduce code complexity of many applications parallelized with the messagepassing interface (MPI) paradigm. In this article, we will investigate the impact of load imbalance on the performance of collective operations and possibility for hiding parallel overhead caused by a collective communication pattern, by overlapping the communication with c...

متن کامل

Towards Automatic Support of Parallel Sparse

In this paper, we present a generic matrix class in Java and a runtime environment with continuous compilations aiming to support automatic parallelization of sparse computations on distributed environments. Our package comes with a collection of matrix classes including operators of dense matrix, sparse matrix, and parallel matrix on distributed memory environments. In our environment, a progr...

متن کامل

A Case for Standard Non-blocking Collective Operations

In this paper we make the case for adding standard nonblocking collective operations to the MPI standard. The non-blocking point-to-point and blocking collective operations currently defined by MPI provide important performance and abstraction benefits. To allow these benefits to be simultaneously realized, we present an application programming interface for non-blocking collective operations i...

متن کامل

Comparison the Sensitivity Analysis and Conjugate Gradient algorithms for Optimization of Opening and Closing Angles of Valves to Reduce Fuel Consumption in XU7/L3 Engine

In this study it has been tried, to compare results and convergence rate of sensitivity analysis and conjugate gradient algorithms to reduce fuel consumption and increasing engine performance by optimizing the timing of opening and closing valves in XU7/L3 engine. In this study, considering the strength and accuracy of simulation GT-POWER software in researches on the internal combustion engine...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Parallel Computing

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2006